word2vec Parameter Learning Explained

نویسنده

Xin Rong

چکیده

The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector representations of words learned by word2vec models have been shown to carry semantic meanings and are useful in various NLP tasks. As an increasing number of researchers would like to experiment with word2vec or similar techniques, I notice that there lacks a material that comprehensively explains the parameter learning process of word embedding models in details, thus preventing researchers that are non-experts in neural networks from understanding the working mechanism of such models. This note provides detailed derivations and explanations of the parameter update equations of the word2vec models, including the original continuous bag-of-word (CBOW) and skip-gram (SG) models, as well as advanced optimization techniques, including hierarchical softmax and negative sampling. Intuitive interpretations of the gradient equations are also provided alongside mathematical derivations. In the appendix, a review on the basics of neuron networks and backpropagation is provided. I also created an interactive demo, wevi, to facilitate the intuitive understanding of the model. 1 Continuous Bag-of-Word Model 1.1 One-word context We start from the simplest version of the continuous bag-of-word model (CBOW) introduced in Mikolov et al. (2013a). We assume that there is only one word considered per context, which means the model will predict one target word given one context word, which is like a bigram model. For readers who are new to neural networks, it is recommended that one go through Appendix A for a quick review of the important concepts and terminologies before proceeding further. Figure 1 shows the network model under the simplified context definition2. In our setting, the vocabulary size is V , and the hidden layer size is N . The units on adjacent An online interactive demo is available at: http://bit.ly/wevi-online. In Figures 1, 2, 3, and the rest of this note, W′ is not the transpose of W, but a different matrix instead. 1 ar X iv :1 41 1. 27 38 v3 [ cs .C L ] 3 0 Ja n 20 16 Input layer Hidden layer Output layer x1 x2 x3 xk

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

The word2vec software of Tomas Mikolov and colleagues has gained a lot of traction lately, and provides state-of-the-art word embeddings. The learning models behind the software are described in two research papers [1, 2]. We found the description of the models in these papers to be somewhat cryptic and hard to follow. While the motivations and presentation may be obvious to the neural-networks...

متن کامل

De-identification In practice

We report our effort to identify the sensitive information, subset of data items listed by HIPAA(Health Insurance Portability and Accountability), from medical text using the recent advances in natural language processing and machine learning techniques. We represent the words with high dimensional continuous vectors learned by a variant of Word2Vec called Continous Bag Of Words(CBOW). We feed ...

متن کامل

Using word2vec to Build a Simple Ontology Learning System

Ontology learning has been an important research area in the Semantic Web field in the last 20 years. Ontology learning systems generate domain models from data (typically text) using a combination of sophisticated methods. In this poster, we study the use of Google’s word2vec to emulate a simple ontology learning system, and compare the results to an existing “traditional” ontology learning sy...

متن کامل

Bag of Words Meets Bags of Popcorn

‘ This problem is selected from one of the Kaggle’s competitions [2]. In this problem, we dig a little ”deeper” into sentiment analysis. Word2Vec is a deep-learning inspired method that focuses on the meaning of words. Word2Vec [3] attempts to understand meaning and semantic relationships among words. It works in a way that is similar to deep approaches, such as recurrent neural nets or deep ne...

متن کامل

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1411.2738 شماره

صفحات -

تاریخ انتشار 2014

word2vec Parameter Learning Explained

نویسنده

چکیده

منابع مشابه

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

De-identification In practice

Using word2vec to Build a Simple Ontology Learning System

Bag of Words Meets Bags of Popcorn

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

عنوان ژورنال:

اشتراک گذاری